Introduction

Column

Determing the Effect of Concrete Components on Concrete Properties

Determing the Effect of Concrete Components on Concrete Properties

Concrete is the most used building material throughout the world today. Concrete is commonly composed of portland cement (“C”), blast furnace slag (“S”), fly ash (“F”), water (“W”), superplasticizer (“SP”), coarse aggregates (“CA”), fine aggregates (“FA”). The purpose of this research is determine which concrete components have a significant effect on the compressive strength. Using Ordinary Least Squares method, the regression model was determined to be \(\hat{strength}\) = 89.5305187 + 0.0802058 \(C\) + 0.0675839 \(F\) -0.1940091 \(W\) -0.0367414 \(CA\) -0.0152489 \(FA\). This indicates that as the portland cement (“C”) content increases, fly ash (“F”) content increases, water (“W”) content decreases, or aggregate (“CA” and “FA”) content decreases, the concrete compressive strength increases, accounting for the effects of the other variables in the model. This model accounts for 88% of the variability in concrete compressive strength and is more appropriate than using the mean compressive strength. Furthermore upon validation, the model accounts 90% of the response variability when predicting new concrete strengths from the model and has a low error rate of 8.1%, indicating the model is appropriate when predicting the concrete compressive strength from the amounts of concrete components.

The data set was retrieved from UCI Machine Learning Repository on November 12, 2019. Data Reference: Yeh, I-Cheng, “Modeling slump flow of concrete using second-order regressions and artificial neural networks,” Cement and Concrete Composites, Vol.29, No. 6, 474-480, 2007.

Is the compressive strength data normal?

Column

What makes concrete strong?

In this day and age, we are surrounded by a concrete jungle –in our buildings, our roads, our pipelines– are all made possible thanks to this wonderful material!

Concrete is a composite material composed of aggregates and cement or simply put “rocks glued together”. The beauty of composites is they have unique properties that individual components do not possess on their own. The aggregates reinforce the surrounding cement creating a strong material. But what makes concrete so strong? Is there are an optinum mixture of components?

The Concrete Jungle

The Concrete Jungle

Image Source: Edward Burtynsky, twittersifter.com twistedsifter.com/2012/03/picture-of-the-day-the-concrete-jungle/

Concrete Components

The following components have an effect on the compressive strength (“strength”), flow (“Fl”), and slump (“Sl”) of the concrete material:

  • Portland Cement (“C”)
  • Blast furnace slag (“S”)
  • Fly ash (“F”)
  • Water (“W”)
  • Superplasticizer (“SP”)
  • Coarse aggregates (“CA”)
  • Fine aggregates (“FA”)

The data represents the amount of each component (kilograms) in a cubic meter of concrete.

Concrete Components

Image Source: Paulo Montiero, UC Berkely

Concrete Properties

Three concrete properties were investigated given various amounts of concrete components:

Concrete compressive strength (“Strength”)

  • concrete samples were tested in compression until they failed
  • determines the strength of the cured composite
  • reported in megapascals (MPa)

Slump (“Sl”)

  • slump is how much drop there is in the wet concrete during the “Slump-Cone Test”
  • helps understand how easy the wet concrete is to work with
  • measured in meters (m)

Flow (“Fl”)

  • flow is the diameter of the wet concrete cone during the “Slump-Cone Test”
  • helps understand how easy the wet concrete is to work with
  • measured in meters (m)
Concrete Slump Test

Concrete Slump Test

Image Source: theconstructor.org/concrete/concrete-slump-test/1558/

Response Variable Exploration

Column

Compressive Strength of Concrete (“Strength”)

  • determines the strength of the cured composite
  • reported in megapascals (MPa)

Compressive Strength has a normal distribution

Column

Slump (“Sl”)

  • slump is how much drop there is in the cone
  • measured in meters (m)

Slump has a left-skewed distribution

Column

Flow (“Fl”)

  • flow is the diameter of the cone
  • measured in meters (m)

Flow has a bimodal distribution

Linear Model of Strength

Column

Concrete Compressive Strength Data

The concrete data was seperated into two data sets– 80% of the data was used to train the model (black) and 20% of the data was used to test the data (red).

Significant Variables Contributing to Concrete Compressive Strength

A linear model relating portland cement (“C”), blast furnace slag (“S”), fly ash (“F”), water (“W”), superplasticizer (“SP”), coarse aggregates (“CA”), fine aggregates (“FA”) to concrete compressive strength (“Strength”) was created.


Call:
lm(formula = Strength ~ C + S + F + W + SP + CA + FA, data = train_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.5113 -1.4818 -0.1948  1.2946  7.5805 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) 123.25103   79.20910   1.556  0.12397   
C             0.06870    0.02504   2.743  0.00763 **
S            -0.01870    0.03529  -0.530  0.59787   
F             0.05611    0.02566   2.187  0.03190 * 
W            -0.22442    0.08030  -2.795  0.00661 **
SP            0.10348    0.15451   0.670  0.50512   
CA           -0.05005    0.03045  -1.644  0.10450   
FA           -0.03017    0.03224  -0.936  0.35240   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.59 on 74 degrees of freedom
Multiple R-squared:  0.8901,    Adjusted R-squared:  0.8797 
F-statistic: 85.62 on 7 and 74 DF,  p-value: < 2.2e-16

Column

Linear Model of Concrete Compressive Strength

The final model was determined by removing the variable with the highest p-value from the t-test until all variables were significant at a confindence level of 0.05. The following variables were determined to have a significant effect on the compressive strength (“Strength”) of concrete (from the t-tests):

  • portland cement (“C”)
  • fly ash (“F”)
  • water (“W”)
  • coarse aggregates (“CA”)
  • fine aggregates (“FA”)

The resulting linear model is:

\(\hat{strength}=\) 89.5305187 + 0.0802058\(C\) + 0.0675839\(F\) -0.1940091\(W\) -0.0367414\(CA\) -0.0152489\(FA\).

Concrete strength increases as:

  • portland cement (“C”) content increases
  • fly ash (“F”) content increases
  • water (“W”) content decreases
  • coarse aggregates (“CA”) decreases
  • fine aggregates (“FA”) decreases

How good is this linear model at predicting the test data?

Using the training data, 87.95% of the variability in strength is accounted for using the model.

Using the test data to validate the model,

  • 90.3658291 % of the varibility in strength is accounted for in the model
  • There is 7.9589768 % error rate

Model Conditions

Column

Summary

For the regression model to be appropriate, the following assumptions or conditions must be valid:

  1. Linear
  2. Zero Mean
  3. Equal Variance
  4. Independent
  5. Normality

Additionally, influential points and collinearity should be assessed.

Conclusion

From reviewing the conditions, the linear model is appropriate as the conditions are valid and there are no influential points or collinearity between variables.

1. Linearality Condition

Using the Residuals vs. Fitted Plot, there are no distinct patterns, which is an indication of a linear relationship between the concrete compressive strength (\(strength\)) and the regressors, portland cement (\(C\)), fly ash (\(F\)) and water (\(W\)).

2. Zero Mean Condition

In Ordinary Least Squares (OLS) regression, the linear regression is created with the error term having a mean of zero and thus, this condition is always satisfied.

3. Independent Condition

Assuming the data is indexed in the order it was collected, the Indexed Residuals Plot indicates the residuals have no specific patterns, suggesting the residuals are independent. Without knowing the order in which the data was collected, the independent condition cannot be confirmed.

4. Equal Variance (aka Constant Variance) Condition

Using the Scale-Location Plot, the points appear to be random with no indication of a pattern, suggesting the constant variance condition is satisfied.

5. Normality Condition

Using the Q-Q Plot, the standarized residuals lie along the 45 degree line, indicating the normality condition is satisfied.

Influential Points

Using the Cook’s D Plot, there are no points that have Cook’s D values greater than 1, suggesting they are not any suspected influential points. Influential points can be an issue because they affect the model estimates. It is important to determine if these data points are “bad” values from mistakes in data collection or if they represent actual values. If these outliers are determined to be from mistakes, they can be removed from the sample data but should be mentioned in the analysis.

Collinearity

In addition to linearity between regressors and the response, it is also important to assess the near linear dependence amoung regressors.

       C        F        W       CA       FA 
1.287230 1.280103 1.375916 1.611531 1.277268 

From the Variance Inflation Factors (VIF) for the five regressors, none of the square root of VIF values are greater than 2, suggesting collinearity is not prescence.

Future Work

Future Work

---
title: "MTH 542 Prj Concrete"
author: "K.M. Burzynski"
output: 
  flexdashboard::flex_dashboard:
    theme: cosmo
    orientation: columns
    social: ["facebook", "twitter", "linkedin"]
    source_code: embed
---

```{r setup, include=FALSE}
# load necessary packages
library(caret)
library(car)
library(ggplot2)
library(plotly)
library(plyr)
library(flexdashboard)  ## you need this package to create dashboard

# read the data set here, I use data: mtcars as an example
concretedata <- read.csv("/Users/katherineburzynski/Documents/MTH 543 - Linear Regression/Project/Concrete_test.csv")

```

Introduction
=======================================================================

Column {data-width=600}
-----------------------------------------------------------------------

### Determing the Effect of Concrete Components on Concrete Properties

#### **Determing the Effect of Concrete Components on Concrete Properties**

Concrete is  the most used building material throughout the world today. Concrete is commonly composed of portland cement (“C”), blast furnace slag (“S”), fly ash (“F”), water (“W”), superplasticizer (“SP”), coarse aggregates (“CA”), fine aggregates (“FA”). The purpose of this research is determine which concrete components have a significant effect on the compressive strength. Using Ordinary Least Squares method, the regression model was determined to be $\hat{strength}$ = 89.5305187 + 0.0802058 $C$ + 0.0675839 $F$ -0.1940091 $W$ -0.0367414 $CA$ -0.0152489 $FA$. This indicates that as the portland cement (“C”) content increases, fly ash (“F”) content increases, water (“W”) content decreases, or aggregate ("CA" and "FA") content decreases, the concrete compressive strength increases, accounting for the effects of the other variables in the model. This model accounts for 88% of the variability in concrete compressive strength and is more appropriate than using the mean compressive strength. Furthermore upon validation, the model accounts 90% of the response variability when predicting new concrete strengths from the model and has a low error rate of 8.1%, indicating the model is appropriate when predicting the concrete compressive strength from the amounts of concrete components. 


The data set was retrieved from *UCI Machine Learning Repository* on November 12, 2019.
**Data Reference:**
Yeh, I-Cheng, "Modeling slump flow of concrete using second-order regressions and artificial neural networks," Cement and Concrete Composites, Vol.29, No. 6, 474-480, 2007.


### Is the compressive strength data normal?

```{r}
histstrength <- plot_ly(concretedata, x=~Strength)
ggplotly(histstrength)
```

Column {.tabset data-width=400} 
-----------------------------------------------------------------------

### What makes concrete strong?
In this day and age, we are surrounded by a concrete jungle --in our buildings, our roads, our pipelines-- are all made possible thanks to this wonderful material!

Concrete is a composite material composed of aggregates and cement or simply put *"rocks glued together"*. The beauty of composites is they have unique properties that individual components do not possess on their own. The aggregates reinforce the surrounding cement creating a strong material. But what makes concrete so strong? Is there are an optinum mixture of components?

![The Concrete Jungle](/Users/katherineburzynski/Documents/MTH 543 - Linear Regression/Project/judge-harry-pregerson-inerchange.jpg)
*Image Source: Edward Burtynsky, twittersifter.com* twistedsifter.com/2012/03/picture-of-the-day-the-concrete-jungle/ ### Concrete Components The following components have an effect on the compressive strength ("strength"), flow ("Fl"), and slump ("Sl") of the concrete material: * Portland Cement (“C”) * Blast furnace slag (“S”) * Fly ash (“F”) * Water (“W”) * Superplasticizer (“SP”) * Coarse aggregates (“CA”) * Fine aggregates (“FA”) The data represents the amount of each component (kilograms) in a cubic meter of concrete.
![Concrete Components](/Users/katherineburzynski/Documents/MTH 543 - Linear Regression/Project/concrete_microstructure.jpg)
*Image Source: Paulo Montiero, UC Berkely* ### Concrete Properties Three concrete properties were investigated given various amounts of concrete components: **Concrete compressive strength ("Strength")** * concrete samples were tested in compression until they failed * determines the strength of the cured composite * reported in megapascals (MPa) **Slump ("Sl")** * slump is how much drop there is in the wet concrete during the "Slump-Cone Test" * helps understand how easy the wet concrete is to work with * measured in meters (m) **Flow ("Fl")** * flow is the diameter of the wet concrete cone during the "Slump-Cone Test" * helps understand how easy the wet concrete is to work with * measured in meters (m)
![Concrete Slump Test](/Users/katherineburzynski/Documents/MTH 543 - Linear Regression/Project/concrete-slum-test-procedure-results.jpg) *Image Source: theconstructor.org/concrete/concrete-slump-test/1558/*
Response Variable Exploration ======================================================================= Column ----------------------------------------------------------------------- ### Compressive Strength of Concrete ("Strength") * determines the strength of the cured composite * reported in megapascals (MPa) ``` {r} boxstrength <- plot_ly(concretedata, x=~Strength, type="box") ggplotly(boxstrength) ``` ### Compressive Strength has a normal distribution ```{r} histstrength <- plot_ly(concretedata, x=~Strength) ggplotly(histstrength) ``` Column ----------------------------------------------------------------------- ### Slump ("Sl") * slump is how much drop there is in the cone * measured in meters (m) ```{r} boxslump <- plot_ly(concretedata, x=~Sl, type="box") ggplotly(boxslump) ``` ### Slump has a left-skewed distribution ```{r} histslump <- plot_ly(concretedata, x=~Sl) ggplotly(histslump) ``` Column ----------------------------------------------------------------------- ### Flow ("Fl") * flow is the diameter of the cone * measured in meters (m) ```{r} boxflow <- plot_ly(concretedata, x=~Fl, type="box") ggplotly(boxflow) ``` ### Flow has a bimodal distribution ```{r} histflow <- plot_ly(concretedata, x=~Fl) ggplotly(histflow) ``` Linear Model of Strength ======================================================================= Column ----------------------------------------------------------------------- ### Concrete Compressive Strength Data ```{r} set.seed(2019) train_index=sample(1:103,82) train_data=concretedata[train_index,] test_data=concretedata[-train_index,] ```
```{r} boxplot(train_data$Strength,test_data$Strength,main="Concrete Strength for Validation",names = c("train","test"), horizontal = TRUE, xlab="Concrete Strength (MPa)", col=c("black","red") ) ```
The concrete data was seperated into two data sets-- 80% of the data was used to train the model (black) and 20% of the data was used to test the data (red). ### Significant Variables Contributing to Concrete Compressive Strength A linear model relating portland cement (“C”), blast furnace slag (“S”), fly ash (“F”), water (“W”), superplasticizer (“SP”), coarse aggregates (“CA”), fine aggregates (“FA”) to concrete compressive strength ("Strength") was created. ```{r} cstrength=lm(Strength~C+S+F+W+SP+CA+FA,train_data) summary(cstrength) ``` Column ----------------------------------------------------------------------- ### Linear Model of Concrete Compressive Strength The final model was determined by removing the variable with the highest p-value from the t-test until all variables were significant at a confindence level of 0.05. The following variables were determined to have a significant effect on the compressive strength ("Strength") of concrete (from the t-tests): * portland cement (“C”) * fly ash (“F”) * water (“W”) * coarse aggregates (“CA”) * fine aggregates (“FA”) ```{r} css=lm(Strength~C+F+W+CA+FA,train_data) ``` **The resulting linear model is:** #### $\hat{strength}=$ `r css$coefficients[1]` + `r css$coefficients[2]`$C$ + `r css$coefficients[3]`$F$ `r css$coefficients[4]`$W$ `r css$coefficients[5]`$CA$ `r css$coefficients[6]`$FA$. **Concrete strength increases as:** * portland cement (“C”) content increases * fly ash (“F”) content increases * water (“W”) content decreases * coarse aggregates (“CA”) decreases * fine aggregates (“FA”) decreases #### **How good is this linear model at predicting the test data?** Using the training data, 87.95% of the variability in strength is accounted for using the model. ```{r} prd=predict(css,test_data) Rsq=R2(prd,test_data$Strength) Rootmean=RMSE(prd,test_data$Strength)/mean(test_data$Strength) ``` Using the test data to validate the model, * `r Rsq*100` % of the varibility in strength is accounted for in the model * There is `r Rootmean*100` % error rate Model Conditions ======================================================================= Column {.tabset data-width=400} ----------------------------------------------------------------------- ### Summary For the regression model to be appropriate, the following assumptions or conditions must be valid: 1. Linear 2. Zero Mean 3. Equal Variance 4. Independent 5. Normality Additionally, influential points and collinearity should be assessed. *Conclusion* From reviewing the conditions, the linear model is appropriate as the conditions are valid and there are no influential points or collinearity between variables. ### 1. Linearality Condition ```{r} plot(css,1) ``` Using the ***Residuals vs. Fitted Plot***, there are no distinct patterns, which is an indication of a linear relationship between the concrete compressive strength ($strength$) and the regressors, portland cement ($C$), fly ash ($F$) and water ($W$). ### 2. Zero Mean Condition In Ordinary Least Squares (OLS) regression, the linear regression is created with the error term having a mean of zero and thus, this condition is always satisfied. ### 3. Independent Condition ```{r} plot(css$residuals, main="Indexed Residuals",ylab="residual") ``` Assuming the data is indexed in the order it was collected, the ***Indexed Residuals Plot*** indicates the residuals have no specific patterns, suggesting the residuals are independent. Without knowing the order in which the data was collected, the independent condition cannot be confirmed. ### 4. Equal Variance (aka Constant Variance) Condition ```{r} plot(css,3) ``` Using the ***Scale-Location Plot***, the points appear to be random with no indication of a pattern, suggesting the constant variance condition is satisfied. ### 5. Normality Condition ```{r} plot(css,2) ``` Using the ***Q-Q Plot***, the standarized residuals lie along the 45 degree line, indicating the normality condition is satisfied. ### Influential Points ```{r} plot(css,4) ``` Using the *Cook's D Plot*, there are no points that have Cook's D values greater than 1, suggesting they are not any suspected influential points. Influential points can be an issue because they affect the model estimates. It is important to determine if these data points are "bad" values from mistakes in data collection or if they represent actual values. If these outliers are determined to be from mistakes, they can be removed from the sample data but should be mentioned in the analysis. ### Collinearity In addition to linearity between regressors and the response, it is also important to assess the near linear dependence amoung regressors. ```{r} sqrt(vif(css)) ``` From the *Variance Inflation Factors* (VIF) for the five regressors, none of the square root of VIF values are greater than 2, suggesting collinearity is not prescence. Future Work ======================================================================= Future Work * investigating transformations to model the effect of concrete components on concrete slump and flow for the cone test * looking into other models that may be more appropriate to model the concrete compression data